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Abstract 



We show that a fast algorithm for the QR factorization of a Toephtz or Hankel matrix 
A is weakly stable in the sense that R^R is close to A^A. Thus, when the algorithm is used 
^ ■ to solve the semi-normal equations R^Rx = A'^b, we obtain a weakly stable method for the 

fT^ I solution of a nonsingular Toeplitz or Hankel linear system Ax — b. The algorithm also applies 

^D ■ to the solution of the full-rank Toeplitz or Hankel least squares problem min \\Ax — 6||2- 
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X ■ 1 Introduction 



Toeplitz linear systems arise in many applications, and there are many algorithms which solve 
nonsingular n x n Toeplitz systems 

Ax = b 

in 0{n'^) arithmetic operations [2, 13, 44, 45, 49, 54, 55, 65, 67, 77, 78, 80, 81, 85, 86]. Some 
algorithms are restricted to symmetric systems {A = A ) and others apply to general Toeplitz 
systems. Because of their recursive nature, most 0{n'^) algorithms assume that all leading 
principal submatrices of A are nonsingular, and break down if this is not the case. These algo- 
rithms are generally unstable, because a leading principal submatrix may be poorly conditioned 
even if A is well conditioned. Thus, stability results often depend on the assumption that A is 
symmetric positive definite, in which case the leading principal submatrices are at least as well 
conditioned as A. 

Asymptotically faster algorithms exist [1, 4, 14, 20, 40, 47, 58, 59]. Sometimes these algo- 
rithms are called superfast [1]. We avoid this terminology because, even though the algorithms 
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require only 0(n(logn)^) arithmetic operations, they may be slower than O(n^) algorithms for 
n < 256 (see [1, 20, 68]). We prefer the term asymptotically fast. 

The numerical stability properties of asymptotically fast algorithms are generally either 
bad [16] or unknown, although some positive partial results have been obtained recently [42]. At- 
tempts to stabilise asymptotically fast algorithms by look-ahead techniques have been made [40] , 
but the look-ahead algorithms are complicated and their worst-case behaviour is unclear. Thus, 
we do not consider asymptotically fast algorithms further, but restrict our attention to O(n^) 
algorithms. 

We are concerned with direct methods for the general case, where A is any nonsingular 
Toeplitz matrix. In this case no 0{n'^) algorithm has been proved to be stable. For example, the 
algorithm of Bareiss [2] has stability properties similar to those of Gaussian elimination without 
pivoting [9, 10, 73, 76], so is unstable and breaks down if a leading principal minor vanishes. 
Several authors have suggested the introduction of pivoting or look-ahead (with block steps) in 
the Bareiss and Levinson algorithms [18, 19, 28, 29, 30, 75, 76], and this is often successful in 
practice, but in the worst case the overhead is 0{n^) operations. The recent algorithm GKO 
of Gohberg, Kailath and Olshevsky [34] may be as stable as Gaussian elimination with partial 
pivoting, but an error analysis has not been published. 

In an attempt to achieve stability without pivoting or look-ahead, it is natural to consider 
algorithms for computing an orthogonal factorization 

A = QR (1) 

of A. The first such O(n^) algorithm was introduced by Sweet [73, 74]. Unfortunately, Sweet's 
algorithm depends on the condition of a certain submatrices of A, so is unstable [8, 56]. Other 
0{n'^) algorithms for computing the matrices Q and R or R~^ in (1) were given by Bojanczyk, 
Brent and de Hoog [8], Chun, Kailath and Lev-Ari [21], Cybenko [23], Luk and Qiao [56, 64], 
and Nagy [60]. To our knowledge none of them has been shown to be stable. In several cases 
examples show that they are not stable. Unlike the classical 0{n^) Givens or Householder 
algorithms, the 0{n'^) algorithms do not form Q in a numerically stable manner as a product of 
matrices which are (close to) orthogonal. 

Numerical experiments with the algorithm of Bojanczyk, Brent and de Hoog (BBH for short) 
suggest that the cause of instability is the method for computing the orthogonal matrix Q; the 
computed upper triangular matrix R is about as good as can be obtained by performing a 
Cholesky factorization of A A, provided the downdates involved in the algorithm are imple- 
mented in a certain way (see §4). This result is proved in §5. As a consequence, in §6 we show 
how the method of semi-normal equations (i.e. the solution of R Rx = A b) can be used to give 
a weakly stable algorithm for the solution of general Toeplitz or Hankel systems. The result also 
applies to the solution of full-rank Toeplitz or Hankel least squares problems. For a discussion 
of the rank-deficient case and a "look-ahead" modification of Cybenko's algorithm, see [43]. 

In §2 we introduce some notation and conventions. The concepts of stability and weak sta- 
bility are defined in §3. The Cholesky downdating problem, which arises in the BBH algorithm, 
is discussed in §4. Numerical results are discussed in §7, and some conclusions are given in §8. 

If i7 is a Hankel matrix, and J is the permutation matrix which on premultiplication reverses 
the order of the rows of a matrix, then JH is Toeplitz. Also, {JH)"^ {JH) = H^H. Thus, our 
results apply equally to Hankel and Toeplitz matrices. Our results might also be extended to 
more general classes of matrices with a displacement structure [50, 51, 52, 53]. For simplicity 
we restrict our attention to the Toeplitz case. 



2 Notation and Conventions 

Let 

(«o • • • Qji-l 

be a real m x n Toeplitz matrix, so Uij = cij-i for 1 < i < m, 1 < j < n. We assume that 
m > n and that yl has rank n. Thus yl A has a Cholesky factorizatfon A A = R R, where R 
is an upper triangular n x n matrix. We assume that the diagonal elements of R are positive, 
so R is unique. Also, A = QR, where Q is an m x n matrix with orthonormal columns. 

If the singular values of A are (Ti,...,(T„, where o"! > . . . > (T„ > 0, then the spectral 
condition number of A is 

K = K2{A) = axjOn- 

For convenience in stating the error bounds, we often assume that a\ is of order unity, which 
can be achieved by scaling. 

A displacement operator D : ^"^^"^ — > sf^{m-i)x(n~i) jg defined as follows: for any m x n 
matrix B, T>{B) = C, where C is the (m — 1) x (n — 1) matrix with entries Cjj = ftj+ij+i — bij, 
1 <i <m,l <j <n. Note that DB = iff -B is Toeplitz. 

Let e be the machine precision. In our analysis we neglect terms of order O(e^). This 
can be justified by considering our bounds as asymptotic expansions in the (sufficiently small) 
parameter e. 

It is often convenient to subsume a polynomial in m and/or n into the "O" notation, and 
indicate this by a subscript. Thus, an error bound of the form 

ll^ll = Om(e) 

means that 

\\E\\ < P{m)e 

for some polynomial P and all sufficiently small e. This notation is useful because minor changes 
in the algorithm or changes in the choice of norm will be absorbed by a change in the polynomial 
P{m). It is often stated (e.g. by Bjorck [6]) that the primary purpose of rounding error analysis 
is insight, and insight can be aided by the suppression of superfluous details.^ 

If the error bound depends on k then this will be mentioned explicitly (e.g. ||£'|| = OmiKs)). 
The meaning of "sufficiently small" may depend on k For example, we may need e < 1/k^. 

We distinguish several classes of numerical quantities - 

1. Exact values, e.g. input data such as Oj. 

2. Computed values, usually indicated by a tilde, e.g. ttj. 

3. Perturbed values given by error analysis, usually indicated by a hat, e.g. Oij, or by the 
argument e, e.g. aij{e). These are not computed, but the error analysis shows that they 
exist and gives bounds on their difference from the corresponding exact values. Sometimes 
the perturbations are of computed quantities, e.g. Ui{e). 
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3 Stability and Weak Stability 

In this section we give definitions of stability and weak stability of algorithms for solving linear 
systems. 

Consider algorithms for solving a nonsingular, nx n linear system Ax = b, so m = n. There 
are many definitions of numerical stability in the literature, for example [5, 6, 9, 12, 16, 22, 36, 
48, 57, 61, 70]. Definitions 1 and 2 below are taken from Bunch [17]. 

Definition 1 An algorithm for solving linear equations is stable for a class of matrices A if for 
each A in A and for each b the computed solution x to Ax = b satisfies Ax = b, where A is close 
to A and b is close to b. 

Definition 1 says that, for stability, the computed solution has to be the exact solution of 
a problem which is close to the original problem. This is the classical backward stability of 
Wilkinson [82, 83, 84]. We interpret "close" to mean close in the relative sense in some norm, 
i.e. 

Ill - A\\/\\A\\ = 0„(e), \\b - b\\/\\b\\ = On{e). (2) 

Note that the matrix A is not required to be in the class A. For example, A might be the 
class of nonsingular Toeplitz matrices, but A is not required to be a Toeplitz matrix. If we 
require A & A we get what Bunch [17] calls strong stability. For a discussion of the difference 
between stability and strong stability for Toeplitz algorithms, see [46, 79]. 

Stability does not imply that the computed solution x is close to the exact solution x, unless 
the problem is well-conditioned. Provided ne is sufficiently small, stability implies that 

|lx-x|l/|lxll = Onine). (3) 

For more precise results, see Bunch [17] and Wilkinson [82]. 

As an example, consider the method of Gaussian elimination. Wilkinson [82] shows that 

\\A-A\\/\\A\\=On{ge), (4) 

where g = g{n) is the "growth factor" . g depends on whether partial or complete pivoting is used. 
In practice g is usually moderate, even for partial pivoting. However, a well-known example 
shows that g{n) = 2^~^ is possible for partial pivoting, and recently it has been shown that 
examples where g{n) grows exponentially with n may arise in applications, e.g. for linear systems 
arising from boundary value problems. Even for complete pivoting, it has not been proved that 
g{n) is bounded by a polynomial in n. Wilkinson [82] showed that g{n) < n^^^"-''^'*' '^•', and 
Gould [38] recently showed that g{n) > n is possible for n > 12; there is still a large gap between 
these results. Thus, to be sure that Gaussian elimination satisfies Definition 1, we must restrict 
A to the class of matrices for which g is 0„(1). In practice this is not a problem, because g can 
easily be checked a posteriori. 

Although stability is desirable, it is more than we can prove for many useful algorithms. 
Thus, following Bunch [17], we define the (weaker, but still useful) property of weak stability. 

Definition 2 An algorithm for solving linear equations is weakly stable for a class of matrices 
A if for each well- conditioned A in A and for each b the computed solution x to Ax = b is such 
that \\x — a:]|/)|a;]] is small. 

In Definition 2, we take "small" to mean 0„(e), and "well-conditioned" to mean that k{A) 
is 0„(1), i.e. is bounded by a polynomial in n. From (3), stability implies weak stability. 



Define the residual r = Ax — b. It is well-known [83] that 



1 ||r|| llx — xll ||r|| , , 

.\\b\\- INI - '^11611 • ^'^ 

Thus, for well-conditioned A, ||x— x||/||x|| is small if and only if ||r||/||6|| is small. This observation 
clearly leads to an alternative definition of weak stability: 

Definition 3 An algorithm for solving linear equations is weakly stable for a class of matrices 
A if for each well- conditioned A in A and for each b the computed solution x to Ax = b is such 
that \\Ax — b\\/\\b\\ is small. 

Now consider computation of the Cholesky factor R of A A, where ^ is an tti x n matrix of 
full rank n. A good 0{mn'^) algorithm is to compute the QR factorization 

A = QR 

of A using Householder or Givens transformations [36] . It can be shown [83] that the computed 
matrices Q, R satisfy 

A = QR (6) 

where Q^Q = I, Q is close to Q, and A is close to A. Thus, the algorithm is stable in the 
sense of backward error analysis. Note that ||^^ — i? i?||/||^ A|| is small, but \\Q — Q\\ and 
||i? — i?||/||i?|| are not necessarily small. Bounds on \\Q — Q\\ and \\R — i?||/||i?|| depend on k, 
and are discussed in [35, 71, 84]. 

A different algorithm is to compute (the upper triangular part of) A A, and then compute 
the Cholesky factorization of A^A by the usual (stable) algorithm. The computed result R is 
such that R R is close to A A. However, this does not imply the existence of A and Q such 
that (6) holds (with A close to A and some Q with Q Q = T) unless A is well-conditioned [72]. 
By analogy with Definition 3 above, we may say that Cholesky factorization of A^A gives a 
weakly stable algorithm for computing R, because the "residual" A A — Iv R is small. 

4 Cholesky Updating and Downdating 

4.1 Updating 

The Cholesky updating problem is: given an upper triangular matrix R G 3??"^" and a vector 
X G 9?", find an upper triangular matrix U such that 

U^U = R^R^xx^ . (7) 

The updating problem can be solved in a numerically stable manner by transforming the matrix 

( x^ \ f U \ 

to upper triangular form j^ by applying a sequence of plane rotations on the left. 

For details, see [36]. 

4.2 Downdating 

The Cholesky downdating problem is: given a upper triangular matrix R G s)[J"X" and a vector 
X G 3?" such that R R — xx is positive definite, find an upper triangular matrix U such that 

U^U = R^R - xx^. (8) 

Proceeding formally, we can obtain (8) from (7) by replacing x by ix. However, the numerical 
properties of the updating and downdating problems are very different. The condition that 



li^ R — xx^ be positive semi-definite is necessary for the existence of a real U satisfying (8). 
Thus, we would expect the downdating problem to be illconditioned if IV R — xx has small 
singular values, and Stewart [72] shows that this is true. 

There are several algorithms for the Cholesky downdating problem [3, 7, 11, 12, 24, 25, 32, 
36, 62, 66] and we shall not discuss them in detail here. What is relevant to us is the error 
analysis. To simplify the statement of the error bounds, suppose that ||-R|| = 0„(1), which 
implies that ||a;|| = 0„(1). Observe that, in exact arithmetic, there is an orthogonal matrix Q 
such that 

Suppose the computed upper triangular matrix is U . Stewart [72] shows that, for the "Linpack" 
algorithm [24], 

where Q(e) is an exactly orthogonal matrix, 

\\x{e)-x\\=On{e), (11) 

and 

\\U{e)-U\\=On{e). (12) 

We can regard x{e) as a (backward) perturbation of the input data x, and U{e) as a (forward) 
perturbation of the computed result U . Because of this mixture of forward and backward 
perturbations, a result of this form is sometimes called a "mixed" stability result. If the problem 
is illconditioned, the backward perturbation is more significant than the forward perturbation. 

It is important to note that the error analysis does not show that the computed matrix U is 
close to the exact result C/, or that Q{£) is close to Q, unless U is well-conditioned. 

A stability result of the same form as (10-12) has been established by Bojanczyk, Brent, 
Van Dooren and de Hoog [7] for their "Algorithm C". Because Algorithm C computes U one 
row at a time, several (updates and/or) downdates can be pipelined, which is not the case for 
the Linpack algorithm^. 

The result (10-12) implies that 

tj'^U = R^R - xx^ + G{e) , (13) 

where 

\\G{e)\\=On{e). (14) 

Clearly a similar result holds if a sequence of (updates and) downdates is performed, provided 
that the final and intermediate results are positive definite. 

5 A Weak Stability Result for the BBH Algorithm 

Our main result (Theorem 1 below) is that the BBH algorithm computes R about as well as 
would be expected if the Cholesky factorization of ^-^A were computed by the usual (stable) 
algorithm. More precisely, the computed R satisfies 

A^A-lfR = Ora{e\\A^A\\). (15) 



^The recent algorithm of Bischof et al [3] can also be pipelined, but we do not know if it gives results which 
satisfy equations (10-12) or (13-14). 



To avoid having to include ||^'^A|| in the error bounds, we assume for the time being that 
aM) = 0{l). 

First consider exact arithmetic. We partition A in two ways: 
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where A-i is an [m — 1) x (n — 1) Toephtz matrix, 

(ai, . . . ,an-i) , 
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Similarly, we partition R in two ways: 
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n,i 



) fln— m— ly 



Rt 



Rb 
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(17) 



where Rf, and Rt are (n — 1) x (n — 1) upper triangular matrices, 

(n,2,---,n,n)'^, 






T 



V'"l,n) • • • ) ^71— l,nj • 



From (16), 

A^A 



Cq + z'^z 



aoy +z^A_i 



Ai^A_i + yy 



T 



A'LiA^i + zz'^ 



aoy + A'^iZ 
where the dots indicate entries which are irrelevant for our purposes. Similarly, from (17), 



(18) 



1,1 



R^R = - 
Equating A^A and R^R, we obtain 



n,i« 



T 



T 



Rt Rt 



Rb Rb + uu 



2 2 , T 

n,i = 00 + ^ z, 



A'^^A^i + yy'^ = Rb'^Rb + uu^ 



and 

Eliminating A_iA^i from (22-23) gives the relation 



A^^A^i + zz'^ = Rt'^Rt. 



Rb Rb — Rt Rt + 



t nt + yy 
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(19) 
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(23) 
(24) 



which is the basis for the BBH algorithm. If Rt were known, then Rb could be computed 
from (24) using one Cholesky updating step and two Cholesky downdating steps, as discussed 
in §4. Also, since updating and downdating algorithms can proceed by rows, knowledge of the 
first k rows of Rt is sufficient to allow the computation of the first k rows of Rb- It is easy 
to compute the first row of R from relations (20-21). (For future reference, suppose that the 



computed first row of R is (fi^i,n^).) It is clear from (17) that the fc-th row of Ri, defines the 
{k + l)-th row of Rt- Thus, we can compute Rt and Ri, row by row. For details see [8]. 

A straightforward extension of the result (13-14) applies to our problem of computing Rt 
and Rh. Provided the "Algorithm C" variant of downdating [7] is used, the computed results 
Rt and Rf, satisfy 



RfRb = RjRt + yy^ 



uu 



zz' + G{e) 



where 



||G(e)||=0™(e). 



(25) 



(26) 



Here y, z and u are inputs to the up/downdating procedures (u may differ slightly from the 
exact u because u has been computed from (20-21)). At this point we make no claims about 
the size of \\Rb — Rb\\ and \\Rt — Rt\\- All we need is that Rb and Rt exist and are bounded for 
sufficiently small e. 

Because of the algorithm for their computation, the computed matrices Rt and Rb are related 
so that we can define the "computed R" , say R, in a consistent manner by 
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n,i 



-,T 



Rt 



Rb 



lT 



(27) 



From (27) we have 



R^R 



1,1 






RjRb + uif 



Rt Rt 



Recall our definition of the operator T> in §2. From (28) we have 

V{R^R) = R^Rb + uu^ - R^Rt. 



V{R'R)=yy' -Yz' +G(e). 



\T 



,.T ^^T 



Thus, from (25), 

Also, from (18), 

D(A^ A) = yy^ -zz^ 

\iE = R^R - A^A and F = V{E) then, from (30-31), 

F = G{e). 
If 1 ^ i ^ ^ ^ ^ then, by the definition of T>{E), 



(28) 

(29) 

(30) 
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(32) 
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i—k,j—k) 
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i-l,j~l 



+ /, 



i-2,j 
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J-j+1,1- 



fc=l 



The first row of R^R is ri^i(fi^i, n^), which is close to ri^i(ri^i,ti-^), so the first row of E has 
norm Om{£)- Also, E is symmetric. It follows that 

ll^ll <{n-l)\\F\\+Ora{e) = Om{e). 

Thus, after scaling to remove our assumption that ai = 0(1), we have proved: 

Theorem 1 // the BBH algorithm is used with the downdating steps performed as in 
"Algorithm C" of [7], then the computed Cholesky factor R of A A satisfies 



\R^R-A^A\\ = Om{£\\A^A\\ 



(33) 



Since A^A = R^R, a comparison of the partitioned forms (19) and (28) shows that 

WRfRh - Rb^RbW = OrrMR^RW) (34) 

and 

\\RjRt - Rt^RtW = Ora{e\\R^R\\). (35) 

From (33) and Stewart's perturbation analysis [72], it follows that 

\\R-R\\/\\R\\ = 0^{Ke). (36) 

Note that the condition number k appears in (36) but not in (33-35). 

6 The Semi-Normal Equations 

Suppose that our aim is to solve a nonsingular n x n Toeplitz linear system 

Ax = b, (37) 

using 0{v?) arithmetic operations. In exact arithmetic, the normal equations 

£Ax = A^h (38) 

and the semi-normal equations 

R^Rx = A^b, (39) 

where R satisfies (1), are equivalent to (37). 

In most circumstances the use of the normal or semi-normal equations is not recommended, 
because the condition number k{A A) may be as large as k{A)'^ (see §5.3 of Golub and Van 
Loan [36]). When A is Toeplitz (but not symmetric positive definite) we can justify use of 
the semi-normal equations. This is because the usual stable algorithms for solving (37) directly 
require 0{n^) arithmetic operations, but we can use the algorithm of §5 to compute (a numerical 
approximation R to) R in 0{n^) operations, and then solve the seminormal equations (39) in 
an additional 0{n^) operations. 

From Theorem 1, we can compute an upper triangular matrix R such that (33) holds. We can 
also compute an accurate approximation d to d = A^b in Oln"^) operations (using the obvious 
algorithm) or in O(nlogn) operations (using the Fast Fourier Transform). Now solve the two 
triangular systems R^w = d and Rx = w. We expect to obtain a result x for which 

||x-x||/||x|| =0„(K^e), (40) 

where k = n{A), provided n^e <^ 1. The residual r = Ax — b should satisfy 

II^IIIfII 

because ||^ r|| = ||yl Ax — A b\\ = 0„(e||A yl||||x||). From (40), the method is weakly stable 
(according to Definition 2), although we can not expect the stronger bound (3) to be satisfied. 
The bounds (40-41) are similar to those usually given for the method of normal equa- 
tions [36], not those usually given for the method of semi-normal equations [5, 61, 66]. This is 
because, in applications of the semi-normal equations, it is usually assumed that R is computed 
via an orthogonal factorization of A, so there is a matrix A such that R R = A A and 

\\A-A\\/\\A\\=On{e). (42) 



However, in our case we only have \\E!^ R — E!^ R\\/\\li^ R\\ = On{s), which imphes the weaker 
bound 

\\A-A\\/\\A\\=On{Ke) (43) 

by Stewart's perturbation analysis [71, 72]. 

An alternative to the use of (39) was suggested by Paige [61]: compute R such that 
li^ R = AA^ , solve RT^ Rw = 6 by solving two triangular systems, then set x = A^w. We 
prefer to use (39) because it is also applicable in the rectangular (least squares) case - see §6.2. 

6.1 Storage Requirements 

The algorithm described in §6 for the solution of the semi-normal equations (39) requires working 
storage 0{'n?) words, because the upper triangular matrix R is not Toeplitz. However, it is 
possible to reduce the storage requirement to 0{n) words. Recall that R is generated row by 
row. Thus, we can solve IVw = d as Ris generated, accumulating the necessary inner products 
with 0{n) storage. We now have to solve Rx = w without having saved R. Provided the 0{n) 
rotations defining the updates and downdates have been saved, we can regenerate the rows of 
R in reverse order (n, n — 1, . . . , 1) and solve Rx = w as this is done. A similar idea was used 
in [15] to save storage in a systolic implementation of the Bareiss algorithm. 

The regenerated matrix R' differs slightly from R because of rounding errors. Numerical 
experiments suggest (though we have not proved) that \\R' — R\\ = Onins)- If true, this would 
imply that the method is weakly stable, and (40) would hold, but the right hand side of (41) 
would need to be multiplied by k. 

An alternative is to use an idea which was suggested by Griewank [39] in a different context. 
Suppose we have a procedure J-'{k) which generates k consecutive rows of i? in a forward direction 
(say rows d+1, . . . ,d + k), and B{k) which generates k consecutive rows in a backward direction 
(say rows d+ k, . . . ,d+ 1). In each case 0{n) words of storage suffice for the initial conditions. 
Then B{2k) can be defined recursively - 

1. Generate rows d + 1, . . . , d + /c using J-'{k), with row d for initial conditions if d > 0, saving 
row d + k. 

2. Generate rows d + 2A;, . . . , d + /c + 1 using B{k) with row d + k for initial conditions. 

3. Generate rows d + k, . . . ,d + 1 using B{k) with row d for initial conditions if d > 0. 

From the discussion in §5, J-'{k) requires 0{kn) operations and 0{n) storage. Thus, we can 
prove by induction that B{k) requires 0{kn\ogk) operations and 0{n\ogk) storage. Overall, 
to generate the n rows of R in reverse order takes 0(n^ log n) operations and 0(n log n) stor- 
age. Thus, at the cost of a factor O(logn) in the operation count, we can reduce the storage 
requirements from 0{n^) to 0(n log n). The numerical properties of this method are exactly the 
same as those of the method which uses 0{n'^) storage, since exactly the same rows of R are 
computed. The scheme described here is not optimal in its use of storage, but is within a factor 
of two of the optimal scheme described in [39] . 

6.2 Toeplitz Least Squares Problems 

If ^ € sjj™^"- is Toeplitz with full rank n, then the semi-normal equations (39) may be used to 
solve the least squares problem 

min||Ax - 6II2. (44) 

The use of semi-normal equations for the general full-rank linear least squares problem is dis- 
cussed in detail by Bjorck [5], and the only significant difference in our case is that the bound (43) 
holds instead of (42), so an additional factor k appears in some of the terms in the error bounds. 
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6.3 Operation Counts 

We briefly estimate the number of arithmetic operations required by the BBH algorithm and 
some of its competitors. We assume that the Toeplitz matrix A is real. Some of the operation 
counts can be reduced by using fast Givens transformations [31, 36], but for simplicity we ignore 
this possibility. We only count multiplications; the number of additions is comparable. 

For simplicity, first consider the case m = n. In the BBH algorithm, the computation of R 
takes Iv? + 0{n) multiplications. The computation of A^h takes n? multiplications if done in 
the obvious way, or 0(n log n) multiplications if the FFT is used. Solving two triangular systems 
takes v? + 0{n) multiplications. Thus, to solve a Toeplitz linear system by the method of §6 
takes 9n^ + 0{n) multiplications, or 8n^ + O(nlogn) if the FFT is used. This is much cheaper 
than the 19n^ + 0{n) multiplications of the BBH QR factorisation algorithm given in §3 of [8], 
which computes Q explicitly. Thus, considering both speed and stability, it is best to avoid the 
computation of Q. 

For the method of Nagy [60], the multiplication count is 16n^ + 0{n), and for Cybenko's 
method [23] it is 23n^ + 0{n). The method TpH of [34] requires 21n^/2 + 0(n log n) real mul- 
tiplications, and the method GKO of [34] requires 13n^/2 + O(nlogn) complex multiplications. 
Thus, the method of §6 should be faster than any of these methods, although GKO may be 
competitive if the Toeplitz matrix A complex. 

For the rectangular case (m > n), the corresponding multiplication counts (omitting low- 
order terms and assuming that the FFT is not used) are: 

2mn + Iv? for the method of §6; 

2mn -|- 9n^ for the method of Nagy [60] using the semi-normal equations; 

Imn -|- 14n^ for the method of Nagy [60] using inverse factorisation (algorithm IF); 

9mn -|- 14n^ for the method of Cybenko [23]; 

13mn -|- 1v? for the method of [8], computing Q explicitly. 
For details of the components making up these operation counts, see Table 4.1 of [60]. 

6.4 Iterative Refinement 

Iterative refinement (sometimes called iterative improvement) can be used to improve an ap- 
proximate solution X to the linear system (37) or the linear least squares problem (44). In 
practice this gives an accurate solution in a small number of iterations so long as the residual is 
computed accurately and the working precision is sufficient to ensure convergence. For details 
see [5, 36, 37, 48, 69, 84]. 

A related idea is to improve the accuracy of R^ and Rt by using Bjorck's "Corrected Semi- 
Normal Equations" [5, 63] or Foster's scheme of iterative improvement [26]. However, if the 
aim is simply to solve a linear system, then it is more economical to apply iterative refinement 
directly to the system. 

6.5 Ill-conditioned Problems 

For very ill-conditioned Toeplitz linear systems and least squares problems, it may be desirable 
to use regularisation, as discussed in §5 of [60]. We do not consider this here, except to note that 
our algorithm can easily be modified to compute the Cholesky factorisation of A^ A -|- a/, where 
a is a positive regularisation parameter. Only small changes in equations (20-23) are required. 

7 Numerical Results 

The algorithm described in §§5-6 has been implemented in Pascal on an IBM PC and DEC 
VAX. In Table 1 we give some results for randomly chosen n x n Toeplitz systems on a DEC 
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VAX with e = 2 ^^ ~ 1.4 x 10 "^"^ . The elements at defining the Toephtz matrix 

ao • • • On-l 

ai-n • • • Go 



I 



were chosen from a normal distribution with specified mean ^ and standard deviation a. (The 
condition number k(^) tends to increase with |/i/o"|.) The solution vector x was chosen with 
normally distributed components (mean 0) and the vector h computed from b ^ Ax. A conse- 
quence is that ||x|j/||6|| is unlikely to be large, even if A is poorly conditioned, but this is typical 
of most applications. Table 1 gives the condition number 



Ki{R) = \\R\\i ■ \\R-^\ 



1, 



which is a rough approximation to k.2{R) = K2{A) (the 1-norm was used for computational 
convenience) . 



Table 1 gives 



ei 



62 



es 



\\rTr-A^A\\i 
e\\ATA\U ' 

\\x — x\\2 

eKi(i?)2||x|l2 ' 
eKi(i?)||A||i||x||2' 



where r = Ax — b. From Theorem 1 and the bounds (40-41), we expect these quantities to be 
bounded by (low degree) polynomials in n. The results confirm this. 

For comparison, the last column of Table 1 gives e§, the value of the normalised residual 63 
obtained via Cholesky factorization of A A. It can be seen that 63 is not much larger than 63. 

We also tried Toeplitz matrices A with some singular principal submatrices (e.g. A with 
a_i = ao = ai). The results were similar to those given in Table 1. 

8 Conclusion 

The method described in §6 for the solution of general nonsingular Toeplitz or Hankel linear 
systems^ requires 0{n'^) operations, is weakly stable, and makes no assumption about the con- 
ditioning of submatrices of A. We do not know any other methods which have been proved to 
be stable or weakly stable and have worst-case time bound 0{n'^). Algorithms which involve 
pivoting and/or look-ahead [18, 27, 41, 43, 76] may work well in practice, but seem to require 
worst-case overhead 0{n^) to ensure stability. 

Our method should be faster than 0{n'^) methods which ignore the Toeplitz structure, even if 
the working precision has to be increased (i.e. e reduced) in our method to ensure that K^e <C 1. 
Storage requirements are 0{n'^), but may be reduced to 0(n log n) or 0{n) as discussed in §6.1. 

We have not looked in detail at all the fast Toeplitz QR factorization algorithms mentioned 
in §1. It is quite likely that some of these give weakly stable algorithms for the computation of 
the Cholesky factor R of A^ A, but no proofs have been published. 

It remains an open question whether there is a fast Toeplitz QR algorithm which is backward 
stable for the computation of Q and R (or R~^). Such an algorithm would give a stable algorithm 
for the solution of Ax = b without recourse to the semi- normal equations. 



^Similar remarks apply for full-rank Toeplitz or Hankel least squares problems. 
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n 


/i/cr 


K,{R) 


ei 


62 


63 


4 


50 


0.0 


3.3'3 


6.7 


9.2'-4 


5.2'-3 


1.4' 


-3 


50 


1.0 


3.6'3 


1.7'1 


6.2'-3 


2.7'-2 


2.1' 


-3 


50 


l.O'l 


9.8'2 


4.0'1 


3.1'-1 


3.7'-l 


9.4' 


-2 


50 


1.0'2 


1.0'4 


1.0'2 


2.0'-l 


5.3'-l 


7.0' 


_2 


50 


1.0'3 


1.5'5 


3.4'1 


4.6'-l 


3.0'-l 


4.6' 


-2 


50 


1.0'4 


9.1'5 


1.0'2 


1.0 


1.2 


9.2' 


-2 


50 


1.0'5 


6.3'6 


9.2 


1.4'-1 


1.6'-1 


2.9' 


-1 


100 


0.0 


1.2'3 


1.4'1 


6.2'-4 


4.7'-3 


1.6' 


-3 


100 


1.0 


1.8'3 


1.5'1 


5.2'-4 


4.4'-3 


3.1' 


-3 


100 


l.O'l 


5.2'3 


8.7'1 


8.5'-2 


1.6'-1 


3.3' 


-2 


100 


1.0'2 


3.3'4 


1.2'2 


4.2'-l 


3.7'-l 


3.5' 


-2 


100 


1.0'3 


4.0'5 


1.4'2 


2.2'-l 


6.6'-l 


4.8' 


-2 


100 


1.0'4 


8.8'6 


1.5'2 


1.0 


8.9'-l 


2.6' 


_2 


100 


1.0'5 


7.8'6 


3.8'1 


5.5'-l 


8.4'-l 


1.4' 


-1 


200 


0.0 


5.7'3 


1.9'1 


1.3'-4 


1.7'-3 


8.7' 


-4 


200 


1.0 


1.2'6 


8.4'1 


1.0'-5 


1.9'-4 


1.6' 


-3 


200 


l.O'l 


5.6'5 


1.6'1 


3.6'-3 


7.0'-3 


4.7' 


-3 


200 


1.0'2 


2.4'4 


2.1'2 


3.0 


2.7 


1.4' 


-1 


200 


1.0'3 


7.9'5 


1.4'1 


8.2'-2 


6.6'-2 


5.2' 


-2 


200 


1.0'4 


5.5'6 


2.8'2 


6.1'-1 


5.0'-l 


4.3' 


-2 


200 


1.0'5 


1.3'8 


3.6'2 


1.8'-1 


3.8'-l 


4.3' 


-2 



Table 1: Weakly stable solution of Toeplitz systems 
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Postscript 

At the time of writing this paper we were not aware of a paper by S. Cabay and R. Meleshko 
[A weakly stable algorithm for Pade approximants and the inversion of Hankel matrices, SIAM 
J. Matrix Anal. Appl. 15 (1993), 735-765], which gives a different weakly stable algorithm for 
the inversion of Toeplitz/Hankel matrices, usually (but not always) in 0{'n?) operations. 
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